Hot spots of unseen fishing vessels

Illegal, unreported, and unregulated (IUU) fishing incurs an annual cost of up to US$25 billion in economic losses, results in substantial losses of aquatic life, and has been linked to human rights violations. Vessel tracking data from the automatic identification system (AIS) are powerful tools for combating IUU, yet AIS transponders can be disabled, reducing its efficacy as a surveillance tool. We present a global dataset of AIS disabling in commercial fisheries, which obscures up to 6% (>4.9 M hours) of vessel activity. Disabling hot spots were located near the exclusive economic zones (EEZs) of Argentina and West African nations and in the Northwest Pacific, all regions of IUU concern. Disabling was highest near transshipment hot spots and near EEZ boundaries, particularly contested ones. We also found links between disabling and location hiding from competitors and pirates. These inferences on where and why activities are obscured provide valuable information to improve fisheries management.


AIS datasets
In 2017, approximately 60,000 active fishing vessels broadcasted their position over Automatic Identification System (AIS), representing roughly two percent of the world's 2.8 million fishing vessels. This AIS fleet is biased toward large vessels, covering the majority (52-85%) of vessels over 24m in length, while considerably fewer vessels (14-19%) between 12-24m and hardly any (<0.4%) vessels less than 12m (20). The high coverage of larger vessels is primarily the result of flag states adopting AIS measures stricter than the Convention for the Safety of Life at Sea (SOLAS), the international regulation governing AIS, which explicitly exempts fishing vessels. As a result, the AIS fleet is also biased toward vessels from upper/middle income countries and distant water fleets, as vessels fishing far from shore tend to be larger.
We acquired over 28 billion AIS messages from Global Fishing Watch (GFW) from 2017-2019. This time range was selected because, starting in 2017, the GFW AIS database contains data from both the Spire and Orbcomm AIS providers, providing a more complete dataset on AIS usage, and because model training and validation relies on a third party subset of AIS data provided by exactEarth for 2017-2019. Each AIS message consists of a timestamp, speed, course, location, and information on vessel identity such as Maritime Mobile Service Identity (MMSI) number and flag state. These data were processed using two convolutional neural networks to identify vessel characteristics and type of fishing activity, such as vessel length, tonnage, fishing gear type, and gear deployment events (for full details on these neural networks see (7)). These data were filtered to fishing vessels, as well as carrier vessels involved in transshipment, and further cleaned to remove highly inactive vessels (vessels with less than five days of AIS activity and less than one day of fishing in a year). Vessels that were offsetting their location or where multiple vessels were using a single MMSI number were also removed, which eliminated a few hundred vessels from the analysis. These vessels were identified as described in (20).
While we include all fishing vessels in our analysis, we focus on four classes of fishing vessels responsible for a large majority of suspected disabling events -trawlers, drifting longlines, tuna purse seines, and squid jiggers. Trawlers are defined as vessels that fish by towing a net through the water and include both bottom and mid-water trawlers. Drifting longline vessels deploy longlines attached to buoys that drift while tuna purse seines are defined by GFW as large purse seine vessels primarily fishing for tuna. Lastly, squid jiggers include vessels that fish while stationary using lights to target squid and other species, notably Pacific saury. The analysis also considers proximity of disabling events to loitering behavior by refrigerated cargo vessels capable of accepting transshipments of catch at sea.
The data were summed across the time-series, aggregated into quarter degree rasters (the coarsest resolution of the environmental datasets) and summarized using three fields related to fleet behavior: vessel days (Fig. S1A), fishing days (Fig. S1B), and suspected AIS disabling events (Fig. S2). Vessel days were calculated as the sum of days across the time-series that each fishing vessel of a fleet was present within a grid cell. Fishing days was the sum of days across the timeseries that each fishing vessel of a fleet spent actively fishing within each grid cell. These data were also processed into suspected AIS disabling events -track segments with missing AIS data where we determined the AIS device was likely to have been intentionally disabled (Section 3). The analyses below were conducted using R version 4.0.4. statistical computing (41) and Python 3.7 (42).
1.1 50 nautical mile restriction AIS messages are recorded by terrestrial (T-AIS) and satellite (S-AIS) receivers. However, these receiver types differ in the range and consistency with which they can receive AIS messages. AIS was designed for line of sight communications, i.e., for communicating with nearby vessels within a few tens of kilometers. AIS devices continuously synchronize with nearby devices to organize their messages into one of 2,250 time slots, ensuring that all messages are received by all AIS devices within range. In this respect, T-AIS receivers will perform like AIS devices on vessels, receiving all messages from vessels within range, which is influenced by the elevation of the T-AIS antennae. In the GFW data, 99% of AIS messages recorded by T-AIS receivers come from vessels within 50 nautical miles of shore. Similarly, MarineTraffic reports the majority of their T-AIS receivers fully cover a range of 40 nautical miles. In contrast, satellites can receive messages from vessels within a few thousands of kilometers, an area far too large for all AIS devices to be synchronized. In areas of high vessel density, these messages can interfere with each other, causing the satellite to receive fewer AIS messages from each vessel. As a result, in areas with more vessels (usually near the coasts in highly developed regions) the number of AIS messages received for a vessel will generally decline as the vessel gets closer to shore before coming within range of a terrestrial receiver. Conversely, a vessel heading out to sea may see its AIS message count drop dramatically once out of range of terrestrial receivers. The effect of this steep gradient between terrestrial and satellite reception is to create long gaps in a vessel's AIS data. Therefore, we restrict our analysis to AIS gaps that start and end in waters more than 50 nautical miles from shore in order to eliminate these reception gaps.
2. Satellite AIS reception quality 2.1 Observed satellite AIS reception quality Due to its design as a collision avoidance system, vessels broadcast AIS messages more frequently at higher speeds. To calculate the observed satellite AIS reception quality in an areameasured as the average number of AIS messages received by satellites per vessel per day -we first filtered the AIS dataset to only consider positions with speeds greater than two knots for Class B and speeds under 14 knots while not at anchor for Class A. At these speeds, Class B broadcasts once every 30 seconds and Class A usually broadcasts once every 10 seconds. Below two knots, Class B devices broadcast every three minutes and above 14 knots Class A devices transmit more frequently than once every 10 seconds. Fishing vessels rarely travel faster than 14 knots, and therefore we did not lose many AIS positions due to this cutoff. Imposing these speed filters is important because counts of AIS messages are influenced by how vessels commonly operate in an area (e.g. high speed steaming in shipping lanes, loitering near ports). The majority of AIS messages are from non-fishing vessels, and thus these vessels will dominate reception calculations (7). Building a reception map that included vessels broadcasting outside these bounds would thus create an overly pessimistic reception map for Class B fishing vessels and an optimistic reception map for Class A fishing vessels, as many cargo vessels in shipping lanes travel faster than this speed and thus broadcast more frequently.
Next, to account for areas where no AIS positions were received, we linearly interpolated the hourly position of every vessel. GFW uses a segmentation algorithm when processing AIS data to assign positions to logical tracks by MMSI (7). The algorithm assigns subsequent AIS positions to a new track segment after a 24 hour gap in AIS messages, and hourly interpolation was not performed between track segments. We chose not to remove AIS disabling events prior to inferring AIS reception for multiple reasons. First, we felt estimates of satellite reception were a prerequisite for identifying suspected disabling events and expected that the number of AIS gaps far exceeded the number of actual disabling events. Indeed, while suspected disabling events are ~10% of all AIS gaps by fishing vessels in the study area (>50 nm from shore), they are just one percent of all fishing vessel gaps greater than 12 hours in the AIS data. Second, due to challenges with terrestrial and satellite AIS reception nearshore, this analysis only focuses on identifying suspected disabling events more than 50 nm from shore. Thus, only suspected disabling events >50 nm offshore could be removed prior to inferring reception quality, which would create discrepancies in our estimates of nearshore (< 50 nm) versus offshore (> 50 nm) reception.
The observed reception quality was then calculated separately for Class A and Class B as the average number of AIS positions received by vessels at one degree resolution (Fig. S3). For both Class A and B devices, S-AIS reception quality decreases close to shore in many areas due signal interference (see Section 1.1). Importantly, Class A AIS devices have a higher output power than Class B (12.5 versus 2 watts), which also increases the likelihood Class A messages will be received by satellites. The calculation and subsequent modeling of reception quality was performed for each month in the dataset (2017-2019) to account for temporal changes in reception as more AIS satellites came online.

Predicted reception quality
Vessel presence in the ocean is highly concentrated in some areas but very sparse in others, and the empirical estimates of reception quality in a given one-degree cell are sensitive to the amount of vessel activity in that cell. Therefore, we chose to smooth observed reception quality by interpolation with a radial basis function (RBF). A RBF was chosen because it provides a relatively straightforward way to interpolate large datasets on an unstructured, global mesh and is well behaved in areas with low point density (43). This interpolation was performed using the scipy Python module (scipy.interpolate.Rbf) with a multiquadric radial basis function and a smoothing parameter equal to one. To minimize the influence of noise, cells with a very low amount of activity (< 15 hours) in a month were down-weighted by using the difference in activity from the minimum 15 hour threshold as a third dimension in the interpolation, effectively placing low activity cells further from the interpolation surface. Figure S4 shows the predicted AIS reception quality for Class A and Class B, respectively, across the full study period. We use the predicted AIS reception quality to set a minimum threshold of ten positions per day in order to exclude AIS gap events from areas of extremely unreliable AIS reception.
The residuals (observed minus predicted reception) for both Class A and Class B models are normally distributed around zero (Fig. S5). When examining the spatial distribution of residuals, we restricted the analysis to cells with at least 24 hours of activity and capped reception quality at 100 positions per vessel day. These restrictions allowed us to better understand the interpolation's performance in areas where reception is of most concern (Fig. S6). This examination suggests under-prediction of reception for Class A AIS primarily occurs in areas of low reception and is evenly distributed in these areas. Over-prediction of reception is concentrated in several regions, including near the Faroe Islands, the Northern Indian Ocean, and Indonesia. Residuals for Class B AIS are much more spatially heterogeneous (Fig. S6). Areas of over-prediction for Class B reception quality include the North Pacific, the Gulf of Guinea, Western Australia, and in the high seas around the Galapagos. Notable areas where Class B reception quality is under-predicted include coastal West Africa, the Western Indian Ocean, and the Southern Atlantic near Argentina. However, reception quality in all of these areas is still above the ten positions per day threshold. Importantly, over/under-prediction of reception quality only affects our results when the prediction moves the reception quality in a cell above or below this threshold. A small number (0.8%) of suspected disabling events representing less than 0.5% of the total time lost to disabling occurred in areas where reception quality was over-predicted above the ten position threshold. Conversely, an additional 176 events were omitted due to the under-prediction of reception quality below the ten position threshold. If included, these events would only increase the total time lost to disabling by approximately 0.1%.

Restricting to gaps longer than 12 hours
Because AIS messages are not broadcast at a constant rate and satellite reception influences how frequently AIS messages are received, analyzing AIS disabling first requires defining the minimum period between AIS messages that is considered a gap. We restricted our analysis to only include gaps longer than 12 hours because satellite reception can vary quite substantially at time scales shorter than 12 hours. There are several dozen satellites between Spire and ORCOMM, and they orbit the earth roughly every 110 minutes. A satellite can usually receive messages from any vessel within the satellite's field of view, which is usually between 2000 and 3000 km depending on the altitude of the satellite. The orbits are such that even in high reception areas, there can be a few hours without many satellites within range, and thus satellite reception varies quite significantly at short time ranages. However, when averaged over 12 hours or longer, satellite reception is roughly constant.
To illustrate how reception is variable at times shorter than 12 hours, we calculated the location and altitude of every Spire and ORBCOMM satellite for every second for two months. We then calculated, for every minute of the day, the average number of satellites that could be seen at a few given locations on earth. That is, how many satellites were over the horizon, and thus capable of receiving a signal from a vessel. While the patterns are different at different latitudes ( Fig. S7 A,D,G,J), a Fourier analysis shows that at all latitudes except very high latitudes, the 12 hour period has the strongest signal and reception quality is therefore roughly constant after 12 hours (Fig. S7 B,E,H,K). Further, we calculated the standard deviation of the number of satellites overhead for different time windows, and found that the standard deviation dropped quickly until 12 hours, after which it remained low ( Fig. S7 C,F,I,L).

Detecting suspected AIS disabling
All gaps in AIS transmission at least 12 hours in length were identified (section 2.3). Poor satellite reception (Section 1.1) and low broadcast rates (Section 2.1) may cause AIS transmission gaps that do not indicate intentional disabling by vessels. Therefore, a classification model was developed to identify AIS transmission gaps that are most likely due to intentional disabling as a function of expected satellite reception quality and AIS broadcast rates.

Rule-based classification model specification
Four features were calculated for each AIS transmission gap, including 1) predicted monthly reception quality where the gap started (described in Section 2.2); 2) number of positions within 12 hours prior to the gap; 3) number of positions within 18 hours prior to the gap; and 4) number of positions within 24 hours prior to the gap. Metrics were limited to when AIS transmission ceases and the gap begins so that this model could be operationalized to identify suspected disabling events in real or near real time without relying on metrics related to when AIS transmissions begin again post gap events. Reception quality (feature 1) is based on the average transmission frequency of all vessels operating in an area, whereas individual vessel position ping rates (features 2-4) account for variability in the transmission frequency of individual AIS devices. Since a vessel can be in a high reception area but ping infrequently and vice versa, this set of features allowed us to test the classification power of both internal and external factors. A set of rule-based classification models that apply thresholds to combinations of these features in an IF-THEN framework were developed, including models using only reception, models using only individual vessel position ping rates, and models that combine both types of features (Table  S1).
Repeated k-Fold Cross Validation was used to identify the ping rates (features 2-4) and/or reception (feature 1) thresholds with the highest performance for identifying intentional disabling events. Since the objective of this study is to identify suspicious and potentially illegal behavior, we wanted to prioritize precision to limit the potential for false positives while still accounting for overall accuracy and recall. Therefore, model performance is calculated with F0.5 score, a variant of an F1 score that weights precision twice as much as recall by setting =0.5, as described in (44). Models were evaluated for every whole integer for individual vessel position ping rate threshold, k, where 1 <= k <= 60 positions in the specified time period (12 or t hours before or after gap), and reception threshold, j, where 11 <= j <= 60 (Fig. S8). The lower bound of j was set at eleven as gaps that start in areas with a reception of ten or less are already excluded from the gaps dataset (Section 2.2).

Labeled gaps
To better assess whether AIS transmission gaps in the GFW AIS dataset were indeed true (i.e. intentional) AIS disabling events, we acquired AIS data from a third AIS provider, Exact Earth, for a subset of vessels in 2017 to 2019. The subset included 4,403 unique MMSI from 89 flag states that collectively covered 112,567 transmission gaps in the GFW AIS data for 2017 to 2019 that are at least 12 hours long, start in areas with a reception greater than ten positions per vessel day, and start at least 50 nautical miles from shore. This subset of gaps to be labeled was investigated and compared to the full set of gaps and was found to be reasonably representative both spatially and across gear types (Figs. S9 and S10).
We then merged the exactEarth AIS data with the GFW AIS data to create a labeled dataset consisting of transmission gaps labeled as "true" gaps -those gaps common to both datasets and thus suspected of occurring from intentional disabling -and "false" gaps -those which are unlikely to be due to intentional disabling and instead were likely caused by technical issues. Gaps were labeled as "true" if there was still a gap of at least 12 hours after adding in the Exact Earth positions. If the gap was not longer at least 12 hours after doing this, then it was labeled as "false". This criteria was set based on the patterns of satellite coverage outlined in Section 2.3. The final labeled data set was divided into separate training and test sets using a 70-30 group shuffle split before performing model selection. The data was grouped on MMSI so that all of the gaps from a given MMSI were either entirely in the training or the validation set. Having gaps from the same MMSI in both sets was found to cause instability in the cross validation scores. Due to the grouping, the final split was actually 72-28 but will be referred to as the 70% training set and 30% test set for simplicity.

Model selection
Model selection was performed using Repeated k-Fold Cross Validation on the 70% training set using the standard of ten folds repeated ten times. Model performance was evaluated using the F0.5 score on the validation set for each cross validation run. This is separate from our use of F0.5 score during model fitting as described in Section 3.1, but it is done for the same reason of wanting to prioritize precision over recall. The average of the F0.5 scores for these 100 runs was calculated for each potential model to find the optimal model (Table S2). Once a final model was selected, it was trained on the full 70% training set and accuracy metrics were calculated using the independent 30% test set.

Model selection results
The distribution of the F0.5 scores for each model can be found in Figure S11. The model with the optimal score was rec_12hb (F0.5=0.741). However, we selected the 12hb model for gaps classification, which scored nominally lower (F0.5=0.739). Because the 12hb model does not rely on satellite reception, it is easier to operationalize in order to eventually produce a suspected disabling dataset that updates in real-time and promotes applied usage of the data. The 12hb model was then parameterized by training on the full 70% training set and was determined to have an optimal vessel position ping rate threshold, k, of 14. This means that gaps with at least 14 positions in the 12 hours before the gap start will be considered to be suspected disabling gaps. This model was used to generate the set of gaps (hereafter: suspected disabling events) used in the remainder of the analysis.
Using the 30% test set withheld before cross validation, this model was shown to work well to limit false positives with a false positive rate of 3.72% and a precision of 0.86 (Fig. S12). The F0.5 score on the test set is 0.718 and the accuracy is 66% with accuracy loss due largely to false negatives (42.93%). This is expected as model selection was aimed at being conservative and minimizing false positives.
The optimal parameterization for each model demonstrated that the addition of individual vessel position ping rate features greatly improved upon the performance of the reception-only model (Fig. S8). However, the addition of reception to position features showed negligible changes in performance from those with position features only (Table S2). Evaluating model performance at each reception threshold showed a mostly stable or nominal increase in optimal model performance with increasing reception followed by a steep decline around 35 positions per vessel day (Fig. S13). These results indicate that using a reception threshold higher than the lower limit of 10 positions per vessel day is not beneficial to model performance. This phenomenon makes sense when considering the use of F0.5 instead of F1 score and that individual vessels within areas of the same reception quality can have vastly different position ping rates. Even in the lowest reception areas, a subset of vessels can still have a sufficient number of positions surrounding their gaps to strongly indicate that the sudden absence of positions is due to intentional disabling. Increasing the reception quality threshold falsely classified these true gaps as false, leading to a decrease in model performance.

Suspected AIS disabling dataset
The final dataset of 55,368 suspected disabling events from 2017-2019 was created by applying the chosen model to all AIS transmission gaps of at least 12 hours that start in areas more than 50 nautical miles from shore and with a reception greater than ten positions per vessel day. These gap events represent those AIS transmission gaps for which we have the most confidence that vessels intentionally disabled their AIS devices. This final dataset of 55,368 suspected disabling events included events by 5,269 distinct MMSI from 101 flag states, with a median duration of 23.5 hours. The median distance traveled during suspected disabling events was 78.6 kilometers. However, the distributions of duration and distance traveled were both heavily right-skewed (Fig. S14). China, Taiwan, Spain, and the United States were the top four flag states in terms of suspected disabling events, collectively accounting for more than 65% of all suspected disabling events (Table 1). With regard to vessel class, drifting longlines, squid jiggers, trawlers, and tuna purse seines accounted for over 92% of suspected disabling events (Table 1).

Spatial allocation of disabling events and time lost to these events
To quantify the scale of the management problem presented by AIS disabling, we developed methods to approximate where vessels operate when their AIS is disabled, thus allowing us to spatially allocate the time vessels spend with their AIS disabled. This analysis required 1) determining if the vessels were, in fact, at sea (and not at port) when their AIS was disabled, and 2) for vessels at sea, estimating where they likely traveled between where they disabled their AIS and turned it back on. This analysis was done using GFW AIS data and not the merged dataset described in section 3.2. Global Fishing Watch processes its AIS data daily with an automated set of algorithms which generate several features important for this analysis and integrating the exactEarth subset into this process was not feasible.
To identify if vessels were still at sea, we analyzed the median and mean voyage time of vessels on high seas voyages to determine if they likely visited port in the interim. To spatially allocate time lost to disabling events, we explored two different methods of varying complexity (linear interpolation and rasterized probabilities) to estimate where vessels operated with their AIS disabled. For our key findings (total and fraction of fishing vessel activity obscured by disabling events, and disabling hotspots; Figures 1&2, Table 1) the two methods provided very similar results (Table S3). Thus our main results use the simplest of these methods (linear interpolation), but both are described and reported here.

Time at port versus sea
The majority of disabling events are short in duration, with 94% being shorter than two weeks. Longer events, though, may account for disproportionately more activity, simply because each event has more time. A key question is if a vessel is at sea or port when its AIS is disabled. Because vessels often turn off their AIS in port, a long disabling event may be a sign that the vessel has gone to port and is not at sea. On the other hand, some vessels can spend many months or longer than a year at sea (15).
We analyzed GFW's database on the length of vessel voyages from 2017 to 2019, and found that the mean time between port visits for voyages farther than 50 nautical miles from shore was 27 days, and voyages longer than 73 days accounted for half of the total vessel activity. If a disabling event is longer than these values, it is more likely that a vessel spent some of the time during a disabling event at port. Thus, for calculating the likely lower bound on the time lost to disabling events, we only consider disabling events significantly shorter than the mean voyage time. We chose a time of 14 days, about 30 percent below the mean voyage. Because voyages are mostly longer than 14 days, our data supports that the vast majority of the time in these disabling events was spent at sea and not at port. However, it is possible that the longer events (> 14 days) represent mostly time at sea and not at port. Therefore, our upper bound on time lost in the high seas allocates all time in disabling events of any length.

Spatial allocation of disabling events
Time lost to suspected AIS disabling is equal to the total duration (hours) of AIS disabling events and is spatially allocated based on two different methods. We spatially allocate activity in disabling events at one degree (grid size of ~111km at the equator).
In the first method (linear interpolation), we interpolate positions of vessels between the start and end of the disabling events positions, assuming that the vessels travel in a straight line when their AIS is off. While linear transits are likely not the case, when using a relatively low-resolution grid (one degree), it provides an estimate of where vessel activity took place during disabling events.
In the second method (rasterized probabilities), we used broadcasting vessels to build probabilistic rasters of where a vessel likely traveled between two positions that are a given distance and time apart. For instance, to determine where a drifting longline may have gone in a disabling event that was three days in duration and 50km in distance, we compiled all tracks in the AIS dataset where a drifting longline had two positions 50km and three days apart and that vessel left its AIS on for the three days in between. Using these tracks, we built a probabilistic raster showing the positions these vessels visited in the interim. The result is a heatmap of where vessels with that starting and ending point were likely to visit.
Because each vessel class has distinctive movements, we built probabilistic rasters for each of our five vessel classes (drifting longlines, trawlers, squid jiggers, tuna purse seines, and other). We identified activity at distances of 0, 10, 20, 40, 80, 160, 320, 640 and 1280 km, and durations of 0.5, 1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 25, 30 and 35 days. These distances and times were chosen to make sure almost every disabling event had a raster that was relatively close in duration or distance. All combinations of duration and distance were paired, with a different raster for each vessel type (675 of cases in total). Each of these cases, except for some that were physically impossible (e.g. 1280 km in 12 hours) had several hundred examples of vessels to draw on. We then plotted every single position that a vessel visited between the start and end point, gridded the amount of time spent in each grid cell (these probability rasters were gridded a resolution of 10km, 10x finer than the resolution of our final analysis), and then averaged these values across all vessels in a given vessel class ( Figure S15 shows some of these probabilistic rasters). For each disabling event, we selected the probability raster that best corresponded in distance, duration, and gear type to the disabling event in question. We then apply a similarity transformation to the probability raster so that its start and and end points correspond to those of the disabling event. The time spent in each grid cell is also adjusted so that the entire raster time sums to the duration of the disabling event. Figure S16 shows an example of a few disabling events spatially allocated using this method and using the interpolation method.
When using a one degree grid, these two methods (linear interpolation and rasterized probabilities) yield very similar results, and our key findings are essentially identical at short time periods (< 2 weeks) but results deviate at longer time periods (Table S3). For disabling events under one week (84% of the events), almost all activity is within one degree (Fig. S17). For events between one and two weeks, about 80% is within one degree. This percentage drops with longer disabling events, and for events four weeks and longer (~3% of events over 4 weeks), only about half of the time lost is close to a line between the two locations (Fig. S17). Figure S18 shows the spatial allocation of time in disabling events for both the rasterized probability and linear interpolation for disabling events shorter than two weeks and for all disabling events.

Total time and fraction of time lost to disabling events
To estimate the time lost to AIS disabling, we used both the linear interpolation and rasterized probabilities methods to spatially allocate disabling events, and summed the total time spent more than 50 nautical miles from shore and above a minimum reception quality of ten positions per day. The linear interpolation method showed slightly higher time lost to disabling (~5% more time) compared to the rasterized probability method (Table S3). This is because the rasterized probability method smooths out interim disabling event positions, placing some positions (and therefore some of the time lost) within 50 nm from shore and outside of our study area (e.g. Fig.  S16). Nonetheless, the numbers differ very little, as shown by Table S3. The lower bound on time lost only considers disabling events under two weeks. the upper bound includes all disabling events, as described in Section 5.1.
The fraction of time lost to disabling events was calculated as the time lost to disabling events divided by the time at sea for all fishing vessels in the study area. The time at sea is the total time broadcasting including all gaps in transmission (which includes both intentional disabling and gaps due to poor AIS reception). Using either method for allocating these longer gaps (raster method or interpolation), we estimate that 3-6% of fishing vessel activity is lost to suspected AIS disabling events, although there are slight differences for some of the gear types between the two methods (Tables 1 & S3).
To spatially allocate the fraction of time lost to disabling events (Figures 1&2) we only include disabling events shorter than two weeks in duration. We have the most confidence in our ability to map these events because our analyses demonstrate that under two weeks 1) vessels are unlikely to have visited port (Section 5.1), 2) there is high agreement between the linear interpolation and rasterized probability methods (Section 5.2), and 3) 80% of vessels are within one degree of a straight line connecting the start and end of disabling events (Fig. S17).

Behavioral and environmental drivers
To understand why vessels intentionally disable their AIS devices, we acquired data on potential behavioral and environmental drivers from multiple sources (Table S4).

Behavioral drivers
GFW has compiled a dataset of refrigerated cargo vessels ("carriers") from which it has identified two-vessel encounters and single-vessel loitering behaviors using rule-based methods (7,17). Loitering behavior by carrier vessels is identical to behavior during two-vessel encounters with fishing vessels, except that the potential second vessel is not visible on AIS. Therefore, loitering behavior in an area may indicate activity by fishing vessels that have disabled their AIS. Loitering activity by carrier vessels was converted into a quarter degree raster showing the total number of loitering hours across 2017-2019 in each grid cell. We log transformed loitering hours to remove the left skew caused by low loitering hours in many grid cells.
We acquired a high resolution land shapefile from the Pacific Islands Ocean Observing System. To identify marine protected areas, we acquired the World Database on Protected Areas (protectedplanet.net) and selected all designated marine protected areas regardless of International Union for Conservation of Nature status. Anti-Shipping Activity Messages (hereafter: piracy) were downloaded from the US National Geospatial-Intelligence Agency across the full available time-series (February 1979 through December 2019). These are mainly reported pirate attacks (88.6%) but include other encounters such as kidnaping, naval engagement, and suspicious approaches. This long time-series was selected to account for the role of historically dangerous waters in driving vessels to disable their AIS devices.
Distances from shore, marine protected areas, and piracy events were calculated and resampled into quarter degree rasters (Fig. 6). All distance drivers were reclassified beyond 400 km to a constant value. In the case of distance from shore, the 400km threshold was measured from 200 nm from shore (the distance of most Exclusive Economic Zone boundaries). This threshold was selected to provide enough distance values to understand the statistical relationship between our response variable and predictor variables in our models that are described below, i.e. within 400 km there are roughly 16 quarter degree grid cells over which variation in the response variable can be calculated. The reclassification removed the ability of distances beyond 400 kilometers to explain any of the variation in the response variable. This was an important step because models might detect correlative relationships between suspicious disabling events and features that are very distant, for example across an ocean basin. While such a relationship may have statistical significance, it is unlikely to have geographic or behavioral meaning.

Environmental drivers
Data for environmental drivers (Fig. 7) were acquired from Copernicus Marine Environment Monitoring Service from 2017-2019 (Table S4). This time-series required us to use both reprocessed and near real-time products for chlorophyll-a and eddy kinetic energy. All data were acquired as daily rasters at quarter degree resolution. To produce sea surface temperature and chlorophyll-a fields, the daily rasters were averaged across 2017-2019 to produce a climatology for each product. Chlorophyll-a was log10 transformed. To calculate the variability of sea surface temperature, the temporal standard deviation of the daily sea surface temperature in each grid cell was calculated across 2017-2019. The following formula was used to calculate daily eddy kinetic energy, where i is an individual grid cell, and vgosa and ugosa are the meridian and zonal components of geostrophic velocity anomalies, respectively: Daily eddy kinetic energy rasters were then averaged across 2017-2019 to produce a climatological raster (Fig. 7).

Boosted regression tree models
The probability of suspected disabling events was modeled as a function of environmental and behavioral drivers using a popular machine learning method called boosted regression trees (BRTs) via the 'dismo' R package (45). BRTs can fit complex nonlinear relationships and are robust to a wide variety of data types and distributions, and have proven application for understanding the drivers of fishing fleet distributions (40,46). The BRT output included a measure of the relative importance of behavioral and environmental drivers for suspected disabling events and fishing activity (Fig. 3). Relative importance (or relative contribution) is a measure of each variable's influence on the response, and is based on the number of times each variable is selected for splitting across all trees (45). Importance is scaled across all variables such that the sum adds to 100, producing relative importance, where higher values indicate stronger contributions (45).
We modeled the probability of suspected disabling using fishing activity as absences to understand if and how these two inherently related behaviors diverged. Prior to modeling, data for both behaviors were restricted to beyond 50 nm from shore to match the dimensions of the suspected disabling dataset. Five models were built for suspected disabling events: a full model containing all the data, and four individual gear type models for drifting longlines, trawlers, squid jiggers, and tuna purse seines. We built individual gear type models to understand how amounts of disabling vary between gear types and if the differing levels of management oversight across gear types affected fleet behavior. The full model contained all disabling events in the dataset, 63% of which were from the four gear types for which individual models were built (drifting longlines, trawlers, squid jiggers, and tuna purse seines). The remaining 37% of the data came from gear types without enough disabling events to build robust models (e.g. set gillnets, trollers), or from vessel tracks that could not be classified to the gear level by the GFW neural network.

Presences and absences
For our boosted regression tree modeling approach, we specified presences and absences from our dataset (Fig. S19A). For suspected disabling events, presences were grid cells in which there was at least one disabling event from 2017-19, ranging from 28874 presences in the full model to 2342 presences for tuna purse seines (Table S5). Absences were defined by randomly subsampling locations with fishing activity and no suspected disabling events across 2017-19. This method selected locations where suspected disabling could have occurred, but was not detected. For the full suspected disabling events model, absences were subsampled across all locations with fishing activity. In the individual gear models, absences were subsampled across locations with fishing activity from the target gear type (e.g. absences for trawlers were selected from grid cells in which trawlers fished but did not disable their AIS devices). For all models, a presence:absence ratio of 1:1 was used, following results that this ratio works best for BRT models (47).

Model fitting and validation
A binomial distribution was used for all BRT models, which is appropriate for presence/absence data. BRTs were built with a bag fraction of 0.6, and a tree complexity of three (45). The learning rate was varied across models between 0.1 and 0.001 depending on the number of data points to ensure at least 1,000 trees were fit for each model (45). The resultant BRTs provided two important outputs: the relative importance of drivers, which allowed us to compare the relative importance of environmental and behavioral drivers across models, and the partial responses to the drivers, which allowed us to understand how suspected disabling and fishing activity respond to the various drivers.
Model performance was evaluated using three metrics: explained deviance, Area Under the Receiver Operator Characteristic Curve (AUC), and True Skill Statistic (TSS). Explained deviance measures how well models can capture patterns in the response variable, and is a measure of model explanatory power. AUC and TSS are metrics that reveal the ability of models to accurately discriminate between presence and absences in novel training data, and are thus measures of model predictive skill. Explained deviance was calculated for each of the final models. AUC and TSS were calculated using 50 iterations of 75/25 cross-validation to explore model performance on novel data. For each iteration and each suspected disabling and fishing activity model (n = 10), a new set of absences was randomly selected (while maintaining the 1:1 ratio of presences to absences). Then, new models were trained using a random 75% subset of the data and tested against the remaining 25% of the data.
Model performance was high across the five suspected disabling models (Table S6). Explained deviance ranged from 12.17% in the drifting longline suspected disabling model to 61.42% in the trawler model. Predictive skill was high across the 5 models, with average AUC values of 0.83 and average TSS values of 0.53. Standard deviations were low, indicating that performances were robust across different model iterations.
As a point of comparison, we built full models and four gear-specific models for vessel presence and fishing activity following the methods above. Vessel presence, fishing activity, and suspected disabling are modeled as nested behaviors: vessel presences represent where vessels choose to go including behaviors such as traveling to fishing grounds and ports; fishing activity is a behavioral subset of vessel presence locations where vessels choose to fish, and suspected disabling is a subset of fishing activity locations where vessels chose to disable their AIS devices. For the vessel presence models (Fig. S19C), presences were grid cells with at least one hour of vessel presence across the 2017-19 time-series, and absences were generated by randomly sampling locations with no vessel presence across the time-series (i.e. background sampling). For fishing activity (Fig. S19B), presences were grid cells that had been actively fished for at least one hour across the time-series, and pseudo-absences were generated by randomly sampling locations with vessel presence and no fishing activity across the time-series. For suspected disabling (Fig. S19A), presences were grid cells with at least one disabling event across the time-series, and absences were generated by randomly sampling locations with fishing activity and no disabling events across the time-series (same models as described above in Section 7.1). As such, these three sets of models test three different hypotheses: 1. Are locations with vessel presence different from the background environment? 2. Are locations with fishing activity different from locations with vessel presence? 3. Are locations with suspected disabling different from locations with fishing activity?
We explored the niche separation of each behavior by comparing the marginal effects of the most important variable in each disabling model to the corresponding marginal effects in the fishing activity and vessel presence models (Fig. S20). In the full model, the marginal effect of loitering activity plateaued at 100 hours for suspected disabling versus ~10 hours for fishing activity and vessel presence. For drifting longlines the marginal effect of distance to shore showed a peak around 200 nm for suspected disabling, while fishing activity and vessel presence were low inshore of 200 nm, increased steeply, and then remained offshore of 200 nm. For tuna purse seiners, the marginal effect of SST rose steeply between 20 and 30 °C for vessel presence, while the marginal effect for fishing activity and suspected disabling did not show a strong response. For squid jiggers, the marginal effect for loitering had a distinct peak between 100-1000 hours that was not present for the other two behaviors. For trawlers, the marginal effect of chl-a indicated a higher occurrence of suspected disabling in productive waters, while fishing activity and vessel presence were more broadly distributed.

Squid jigger disabling and loitering
Loitering activity was the most important predictor in the squid jigger model for suspected disabling, with the occurrence of suspected disabling increasing as loitering activity increased ( Fig. 3 and 4). To understand why loitering was more important in the suspected disabling model compared to the fishing activity model, we examined the proximity of suspected disabling events, fishing activity, and vessel presence to loitering events.
To calculate relative proximity to loitering events, we interpolated all fishing activity, vessel activity, loitering activity, and disabling events to a constant temporal grid, so we had the likely location of fishing, vessels, loitering, and disabling events at every hour. For each fishing and vessel position and each disabling event, for each hour, we then calculated the distance to the closest loitering event. We then calculated the fraction of fishing activity, vessel activity, and disabling events that were within a given distance from loitering events. We found that squid jigger suspected AIS disabling events occur considerably closer to loitering events than fishing activity or vessel activity (Fig. S21).
It is important to note that, due to the high variability of satellite reception quality at time scales shorter than 12 hours (Section 2.3), we only consider suspected disabling events longer than 12 hours in duration. Transshipment events generally take less than 12 hours, and thus we are likely underestimating the importance of loitering events in the BRT models.

Suspected disabling events adjacent to EEZ boundaries
We compared the ratio of suspected disabling and fishing activity within 100 km from EEZ boundaries for each EEZ and flag state to investigate if there is more suspected disabling than would be expected by the amount of fishing activity (Fig. S22). To assess the pairwise relationships between flag states and EEZs involved in EEZ-adjacent disabling, suspected disabling events within 100 kilometers of an EEZ were associated with the nearest EEZ. For this analysis, version 11 of EEZ boundaries was downloaded from marinergions.org. These data were summarized as the total number of suspected disabling events between each EEZ and flag state pair, e.g. the total number of suspected disabling events in Spanish fleets adjacent to the Liberian EEZ. As a point of comparison, we also assessed the pairwise relationships between flag states and EEZs involved in EEZ-adjacent fishing. All grid cells that were actively fished within 100 kilometers of an EEZ were associated with the nearest EEZ. These data were summarized as the total number of fishing days between each EEZ and flag state pair, e.g. the total number of days Spanish fleets fished in waters adjacent to the Liberian EEZ. For each EEZ-flag state pair, the number of suspected disabling events and fishing days were converted into the percentage of total EEZ-adjacent suspected disabling events and fishing days, respectively. This allowed us to identify EEZs and flag states that had higher percentages of total suspected disabling than expected based on their total fishing days.
In addition, we explored the relative percentages of suspected disabling and fishing activity within or adjacent to EEZs with overlapping claims. The EEZ shapefile contains an attribute for polygon type ("200NM", "Joint regime", "Overlapping claim"). Disabling events and fishing activity that occurred within 100 kilometers from EEZs were associated with the nearest EEZ, and the percentage of these events within or adjacent to EEZs with overlapping claims was calculated. Four percent of disabling events and 2% of fishing activity was within or adjacent to EEZs with overlapping claims, which comprise 11% of EEZ polygons. A two proportion Z-test indicated that these percentages were significantly different (p-value < 0.0001). EEZs with overlapping claims that had the most disabling events were the Chagos Archipelago (claimed by the UK and Mauritius), the Falkland/Malvinas Islands (claimed by the UK and Argentina), and the Kuril Islands (claimed by Japan and Russia).                      Supplementary Tables   Table S1. Descriptions of the rule-based classification models.